Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
Genomics & Informatics ; : e14-2019.
Article in English | WPRIM | ID: wpr-763810

ABSTRACT

The total number of scholarly publications grows day by day, making it necessary to explore and use simple yet effective ways to expose their metadata. Schema.org supports adding structured metadata to web pages via markup, making it easier for data providers but also for search engines to provide the right search results. Bioschemas is based on the standards of schema.org, providing new types, properties and guidelines for metadata, i.e., providing metadata profiles tailored to the Life Sciences domain. Here we present our proposed contribution to Bioschemas (from the project “Biotea”), which supports metadata contributions for scholarly publications via profiles and web components. Biotea comprises a semantic model to represent publications together with annotated elements recognized from the scientific text; our Biotea model has been mapped to schema.org following Bioschemas standards.


Subject(s)
Biological Science Disciplines , Search Engine , Semantics
2.
Genomics & Informatics ; : e19-2019.
Article in English | WPRIM | ID: wpr-763805

ABSTRACT

In this paper, we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design elements of annotation models and processes that are particularly problematic for, or amenable to, enabling seamless communication across different platforms. The study is conducted in the context of a specific annotation methodology, namely machine-assisted interactive annotation (also known as human-in-the-loop annotation). This methodology requires the ability to freely combine resources from different document repositories, access a wide array of NLP tools that automatically annotate corpora for various linguistic phenomena, and use a sophisticated annotation editor that enables interactive manual annotation coupled with on-the-fly machine learning. We consider three independently developed platforms, each of which utilizes a different model for representing annotations over text, and each of which performs a different role in the process.


Subject(s)
Linguistics , Machine Learning , Natural Language Processing
3.
Genomics & Informatics ; : e20-2019.
Article in English | WPRIM | ID: wpr-763804

ABSTRACT

Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.


Subject(s)
Dataset , Information Storage and Retrieval , Methods , Semantics , Vocabulary
4.
Genomics & Informatics ; : e40-2018.
Article in English | WPRIM | ID: wpr-739673

ABSTRACT

There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.


Subject(s)
Genomics , Informatics
5.
Genomics & Informatics ; : 75-77, 2018.
Article in English | WPRIM | ID: wpr-716819

ABSTRACT

Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.


Subject(s)
Data Mining , Genome , Genomics , Informatics , Information Storage and Retrieval , Korea , Linguistics , Natural Language Processing , Semantics
6.
Journal of Medical Informatics ; (12): 74-79, 2017.
Article in Chinese | WPRIM | ID: wpr-619657

ABSTRACT

The paper takes the reports and conference proceedings discussed by domain experts during 2015-2016 International Biocu ration Conference and the research literatures about biocuration and data biocuration in PubMedCentral in recent 5 years as the data sources,analyzes,concludes and summarizes the research subject of biocuration through the content analysis method,and focuses on the sorting of working mechanism of biocuration,construction & application,integration & visualization,review and editing & application of biomedical data standards,mining of biomedical texts,in order to provide international experience for the development of biocuration in China.

7.
Chinese Journal of Medical Library and Information Science ; (12): 28-32, 2015.
Article in Chinese | WPRIM | ID: wpr-482029

ABSTRACT

Five genes that are closely related with leukemia were detected and identified using COREMINE Medi-cal, and the abstracts of related papers covered in PubMed were analyzed with the biomedical text mining tool, Chilibot, which showed that leukemia interacts with the 5 genes detected using COREMINE Medical.

8.
Genomics & Informatics ; : 99-106, 2004.
Article in English | WPRIM | ID: wpr-217504

ABSTRACT

In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein-protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.


Subject(s)
Data Mining , Mining , Natural Language Processing , Machine Learning
SELECTION OF CITATIONS
SEARCH DETAIL